Converting the TüBa-D/Z Treebank of German to Universal Dependencies

نویسندگان

  • Çagri Çöltekin
  • Ben Campbell
  • Erhard W. Hinrichs
  • Heike Telljohann
چکیده

This paper describes the conversion of TüBa-D/Z, one of the major German constituency treebanks, to Universal Dependencies. Besides the automatic conversion process, we describe manual annotation of a small part of the treebank based on the UD annotation scheme for the purposes of evaluating the automatic conversion. The automatic conversion shows fairly high agreement with the manual annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Treebank Profiling of Spoken and Written German

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ’die tageszeitung’ (taz). The approach can be used more generally as a means of disti...

متن کامل

What Linguists Always Wanted to Know about German and Did not Know How to Estimate

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ‘die tageszeitung’ (taz). The approach can be used more generally as a means of dis...

متن کامل

Is it Really that Difficult to Parse German?

This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big diff...

متن کامل

TüBa-D/W: a large dependency treebank for German

We introduce a large, automatically annotated treebank, based on the German Wikipedia. The treebank contains part-of-speech, lemma, morphological, and dependency annotations for the German Wikipedia (615 million tokens). The treebank follows common annotation standards for the annotation of German text, such as the STTS part-of-speech tag set, TIGER morphology and TüBa-D/Z dependency structure.

متن کامل

What Treebanks Can Do For You: Rule-based and Machine-learning Approaches to Anaphora Resolution in German

This paper compares two approaches to computational anaphora resolution for German: (i) an adaption of the rule-based RAP algorithm that was originally developed for English by Lappin and Leass, and (ii) a hybrid system for anaphora resolution that combines a rule-based pre-filtering component with a memory-based resolution module. The data source is provided by the TüBa-D/Z treebank of Ger-man...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017